A model for codon position bias in RNA editing 
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RNA editing can be crucial for the expression of genetic information via inserting, deleting, or 
substituting a few nucleotides at specific positions in an RNA sequence. Within coding regions in 
an RNA sequence, editing usually occurs with a certain bias in choosing the positions of the editing 
sites. In the mitochondrial genes of Physarum polycephalum, many more editing events have been 
observed at the third codon position than at the first and second, while in some plant mitochondria 
the second codon position dominates. Here we propose an evolutionary model that explains this 
bias as the basis of selection at the protein level. The model predicts a distribution of the three 
positions rather close to the experimental observation in Physarum. This suggests that the codon 
position bias in Physarum is mainly a consequence of selection at the protein level. 
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INTRODUCTION 

The central dogma of molecular biology states that the 
transfer of genetic information flows from the genomic 
DNA to messenger RNA to proteins. This implies that 
there are two fundamental processes involved in the fab- 
rication of proteins. The first such process is transcrip- 
tion and it consists of copying the genomic DNA into a 
messenger RNA of identical sequence. Both, the genomic 
DNA and the messenger RNA use an alphabet of four let- 
ters which makes direct copying of one sequence into the 
other possible. The second process is called translation 
and consists of synthesizing a protein, i.e., a sequence 
of amino acids, following the instructions contained in 
the sequence of the messenger RNA. During translation 
groups of three consecutive bases of the messenger RNA 
— the codons — are read in order to determine which of 
the 20 amino acids is to be appended to the protein being 
synthesized. The map from the 4 3 = 64 possible codons 
into the 20 possible amino acids is called the genetic code. 

Before translation into proteins, it has been discovered 
that RNAs transcribed from most eukaryotic genes un- 
dergo a variety of processing events, that convert RNA 
precursors into mature RNAs ready for translation. For 
example, the splicing process removes extended stretches 
of the nucleotide sequences called introns from an RNA 
precursor such that only the remaining RNA sequence 
codes for a protein. 

Besides these normal processing events, many novel 
phenomena that change the content of RNA sequences 
before translation have been discovered in several differ- 
ent organisms llHSSEIlHSHIliQ Th ° SC 
phenomena, which are now coined as RNA editing, con- 
sist of inserting, deleting, or changing individual or a very 
small number of nucleotides. Still, even by changing only 
a few nucleotides RNA editing can significantly alter the 
coding and result in functionally distinct proteins. In 
addition, RNA editing has also been found to occur in 
non-coding regions like tRNAs and rRNAs, and can alter 
the function of these RNA molecules as well. 



For editing within the coding regions, there is often a 
significant codon position bias. For example, it has been 
discovered that in the mitochondrial genes of the slime 
mold Physarum polycephalum, about two thirds of the 
editing events that insert Cs happen at the third posi- 
tions of their respective codons [5|, |6| . On the contrary, 
in mitochondrial genes of Arabidopsis and other plants, 
about 90 percent of the editing events, which in this case 
convert Cs to Us, happen at the first two positions of the 
codons [IlElEi- 

Codon position bias is somewhat surprising since there 
is no obvious relation between the editing which happens 
on the RNA level and codons which in principle have a 
meaning only during translation. The underlying reason 
why the editing machinery might prefer certain codon 
positions for editing is still not understood. 

Here, we propose an evolutionary model which explains 
the editing position bias in mRNAs. In the evolutionary 
scheme, a population of an organism undergoes muta- 
tions which alter the DNA sequences of the individuals 
in the population and change their genetic information. 
Under natural selection, the members in the gene pool 
that carry the genes with higher fitness grow faster and 
increase their relative frequency in the total population. 

The fitness of a gene in general depends on the encoded 
protein sequence as well as other biological parameters. 
In our model we will assume that the only dominant se- 
lection mechanism is the fitness of the resulting protein 
sequence. This assumption is reasonable if editing is rela- 
tively inexpensive, because in this case most editing sites 
will be random in nature rather than involve some other 
biological factors. Comparing our results with the actual 
codon position bias data then reveals if this assumption 
is true or if there are other, more fundamental selection 
mechanisms at work in a given organism. 

The two cases we apply our model to are the mito- 
chondrial mRNAs of Physarum polycephalum and of the 
plants Arabidopsis thaliana, Brassica napus, and Oryza 
sativa. Abundant editing events have been observed in 
these organisms. In Physarum mitochondrial mRNAs, 
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one in every 25 bases is edited on average, which leads 
to about 1 in every 8 codons being edited on average. 
In plant mitochondrial mRNAs, about 2% of the nu- 
cleotides arc edited on average. In the remainder of this 
article, we briefly describe the editing events in these or- 
ganisms. Then we focus first on the evolutionary model 
for Physarum. Following similar approaches for the case 
of Physarum, we then move on to the case of plant mi- 
tochondria. 

Mitochondrial RNAs of Physarum polycephalum have 
been found to be edited extensively by insertions of 
mono-(C,U) and dinucleotides (CU,GU,UA,AA,GC, and 
UU) [5J, ||| . Among the editing events within mRNAs of 
Physarum polycephalum, C insertions are the most fre- 
quent events. In plant mitochondria, the most abundant 
events are C to U conversions. Thus, we only focus on 
these most frequent editing events in this communication. 

For C insertions in 11 mitochondrial mRNAs of 
Physarum polycephalum |l5|. some editing positions are 
ambiguous as nucleotide Cs are inserted right next to an- 
other C. Excluding these ambiguous editing sites, 227 C 
insertions are observed within the coding regions. Among 
these, 58 and 24 insertions occur at the first and second 
positions of the codons, respectively. The remaining 145 
insertions are found at the third codon position. In this 
case, the third position is the most favorable and the 
second position is the least favorable as already noted 
in previous work comprising only 5 mitochondrial mR- 
NAs [3. 

In plant mitochondria, the dominant editing event is 
a C to U conversion. Thus, there is no ambiguity in de- 
termining the codon position of these events. The edit- 
ing codon position preferences of the three plant mito- 
chondria studied here are quite similar. In Arabidopsis, 
154/236/51 C to U conversions have been observed at 
the three positions . In Oryza 0] and Brassica 0] , 
142/243/33 and 174/230/77 conversions have been ob- 
served respectively. This suggests that the editing posi- 
tion bias stems from the same mechanism for the three 
organisms. Notice that the order of the position bias is 
the opposite to that of Physarum. 

We now develop our model for the insertional edit- 
ing events of Physarum. This model will later be easily 
adapted to the case of substitutional editing such as in 
the plant mitochondria. 

The scheme of our evolutionary model is the follow- 
ing. During the proliferation of Physarum, nucleotide 
mutations occur at random positions in the mitochon- 
drial DNA sequence. These mutations can be substi- 
tutions, insertions, and deletions. Although we model 
all three types of mutations, we are mostly interested 
in deletions in the DNA sequence. Most offspring with 
a deletion die because the protein produced according 
to the mutated DNA sequence is out of frame from the 
site of the deletion on and thus cannot function properly. 
However, some few deletions may be accepted by the edit- 




AAA • • ACA «-» ACC «->• ACG «-» ACU • • CCG • • 
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• CCA • • CCG • ACG • • • ACC • • CCC • • 

FIG. 1: Mutations among codons of the evolutionary model 
for Physarum. Solid lines stand for mutations between regular 
codons. Dashed lines are for mutations between a regular 
codon and a codon with an editing site. 

ing machinery which would insert back nucleotides C to 
those positions of mutation and thus rescue them. Un- 
der natural selection, these offspring would survive if the 
resulting protein could function properly in place of the 
original one. 

The genetic code, i.e., the rules of translation from 
codons to amino acids, is organized such that often the 
third codon position is irrelevant to the identity of the 
amino acid interpreted into the protein sequence. Thus, 
the third codon position is the least sensitive to nu- 
cleotide changes to C generated by editing events. There- 
fore, in this mutation-selection model, random deletions 
at the third codon position survive much better than at 
the first and second position. 

Beyond this qualitative picture, the random mutations 
and selection can be rigorously formulated in the follow- 
ing way using as an example the codon ACG. For random 
mutations, we assume the substitutions and deletions oc- 
cur randomly at one of the three positions with certain 
mutation rates /i s and fi e . The random substitutions re- 
sult in 9 regular codons by converting the base in each 
position into 3 other nucleotides (Fig. . The random 
deletions generate 3 codons with editing sites CCG, or 
ACG, or ACC, where the bar stands for a vacant site in 
the DNA sequence after mutation. As an amino acid is 
expressed, this vacant site will be treated as a C to repre- 
sent the editing event in which a nucleotide C is inserted 
before translation. Note that /i e is the effective rate for 
a deletion that is accepted by the editing machinery and 
rescued by inserting a C. This rate is in general smaller 
than the base deletion rate. For codons with a vacant 
site, they also undergo insertions which randomly insert 
one of the four nucleotides (A,C,G,U) back into the va- 
cant site with a mutation rate fj, e , which we take to be 
the same as the random deletion rate. 

The basic assumption of our model is that selection 
happens only at the amino acid level. Thus, the fitness 
of a codon, i.e., the growth rate, depends only on the 
similarity of the new amino acid coded for by the mutated 
codon to the original amino acid. To be specific, we will 
use the amino acid scoring matrix BLOSUM62 to 
quantify the similarities between amino acids. However, 
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our results are not very sensitive to the choice of this 
matrix. As an example, the initial codon ACG is assigned 
a large growth rate of 6 because it translates into the 
desired amino acid threonine. The mutated codon CCG 
would express the amino acid proline and is assigned a 
growth rate of -1 since proline behaves quite differently 
from thrieonine according to the BLOSUM62 matrix. 

Apart from the fitness at the amino acids level, a codon 
created by an editing event is considered as less fit than 
a regular one because the editing machinery presumably 
uses up resources in order to perform the editing. We 
thus reduce the growth rate for a codon created by an 
editing event by an editing cost c. 

In order to cast this model in mathematical terms we 
define the fraction of codon i as Xi, where i = 1 ~ 64 
is for the 64 regular codons and i — 65 ~ 112 is for 
all codons containing an editing site. The corresponding 
growth rates and editing costs are defined as and Ci, 
where gi is obtained from the BLOSUM62 matrix and 
c, = c for i = 65 ~ 112 and zero otherwise. We then 
write the general model as 
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where ^ is the general mutation rate from codon i to 
codon j, and is set as fi s for substitutions, /x e for deletions 
and insertions, and for codons that cannot be mutated 
into each other. 

We will look at the distribution of regular codons and 
codons with editing sites in this model at equilibrium. 
Although the last term in this equation is non-linear, 
Eigen's theory [lil Hij tells us that the equilibrium dis- 
tribution is proportional to the eigenvector of the ma- 
trix Mij for the largest eigenvalue. Thus, obtaining the 
equilibrium distribution is merely a matter of numeri- 
cally finding the eigenvector for the largest eigenvalue of 
a 112 x 112 matrix which we do with Mathematica |20| . 

To obtain some insight into this evolutionary model, 
we again first look at the case where the initial codon 
is ACG. We will focus on the limit of strong selection 
in which most results become largely independent of the 
individual mutation rates fi s and /u e . We obtain this 
strong selection limit by setting fi s and /i e to be much 
smaller than 1 since the growth rates that set the time 
scale are of order 1. To be more specific, we set ji s = /x e = 
10 -6 because the relative magnitude of fi s to \i e only 
slightly affects the result. As expected, the equilibrium 
distribution shows that only those codons that translate 
into the original amino acid survive. Thus, among the 
codons with editing sites, only ACA, ACC, ACG, ACU, 
AGC survive since the genetic code depends only on the 



first two positions in this case. We also find that the ratio 
between editing at the second position and at the third 
position is about one. Thus, we arrive at a simple scheme 
in the strong selection limit that only the mutations that 
result in the same amino acid survive and each position 
that survives contributes equally at equilibrium. 

This simple scheme allows us to quickly predict the ra- 
tio among edited positions for each codon. The overall 
ratio is then easily obtained by considering all possible 
codons with proper weights. The weights of the codons 
are determined by the experimentally observed unedited 
codon frequencies in the DNA sequence. This scheme re- 
sults in the ratio of editing positions as 17/11/72 percent 
which is rather close to the experimental observed ratio 
25/11/64 percent. 

The above result is obtained without any editing cost, 
i.e., c=0. We find that the role of the editing cost is 
mainly to reduce the fraction of the total number of 
codons with editing sites; the ratio among the editing 
positions is only slightly changed upon introducing an 
editing cost. Thus, by adjusting the editing cost, we can 
tune the fraction of codons with C insertions to match 
the experimental result of about 7%. In this case, the 
overall ratio of editing positions is about 19/15/66 per- 
cent, which is still very close to the experimental result. 

We conclude that this simple evolutionary model pro- 
vides a possible scenario for the editing codon position 
bias. This suggests that in Physarum the majority of 
editing events might indeed be subject to no other bio- 
logical constraint but the fitness of the resulting protein 
sequence. Thus, editing events might be randomly ac- 
quired. 

This is also consistent with the evolution of RNA edit- 
ing in Physarum and its close relatives [2lJ. Abun- 
dant editing has also been observed in the organisms 
Didymium nigripes and Stemonitis flavogenita. However, 
in Arcyria cinerea and Clastoderma debaryanum, only 
few C insertions or even none are observed. This indi- 
cates that the editing machinery in Arcyria and Clasto- 
derma is not yet fully developed. The fact that Physarum 
has acquired so many editing sites since the divergence 
from Arcyria and Clastoderma suggests that editing in 
Physarum is less expensive and that thus random editing 
is likely. 

To study plant mitochondrial genes, we slightly modify 
our evolutionary model as follows. The random substitu- 
tions between regular codons remain the same as in the 
case of Physarum, and they occur with a rate /x s . As 
to the editing events that convert Cs to Us, we assume 
that for certain bases that mutate to Cs, the editing ma- 
chinery would recognize these sites and convert them to 
Us before translation. Here, we use U to represent such 
a mutation. Thus, as an example, the codon ACG can 
mutate into UCG, AUG, and ACU as the correspond- 
ing nucleotide mutate to a C and then is recognized by 
the editing machinery. This process involving an editing 
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event occurs with a mutation rate /j, e . Again, the back 
mutation that converts a U to one of the four nucleotides 
(A,C,G,U) is modeled to also occur with the same mu- 
tation rate /i e . Mathematically, the evolutionary model 
for Physarum and plant mitochondria are identical. 

This evolutionary model, when applied to plant mito- 
chondrial genes, does not predict the correct preference 
in the codon positions for editing. Instead, it still pre- 
dicts the majority of editing sites to occur at the third 
codon position. This suggests that in plant mitochon- 
drial genes editing events do not just happen randomly, 
but that there exist some other biological mechanisms 
that bias the choice of editing sites. 

This assertion is consistent with what is known about 
the editing mechanism in subcellular organelles of plants. 
Recent experiments show that different editing sites in 
subcellular organelles of plants usually require different 
proteins in the editing process j^- In such a case, each 
editing site is quite expensive in terms of resources the 
cell has to provide. It is highly unlikely that such expen- 
sive editing sites are randomly acquired. The existence of 
editing events in spite of the fact that they are expensive 
is an indication that these editing sites are significantly 
involved in biological functions beyond simply providing 
the correct protein sequence. 

In summary, the closeness of the result of our evo- 
lutionary model to the experimental observations in 
Physarum suggests that the editing position bias in 
Physarum is mainly a consequence of random mutations 
with selection at the protein level. In the case of plant 
mitochondria, the disagreement between the evolution- 
ary model and the observations implies that some other 
biological factors besides selection at the protein level 
must also play a role in the evolution, so that the ran- 
dom mutation-selection scheme does not fit in these or- 
ganisms. 
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